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Abstract — We formulate and analyze a stochastic threshold 
group testing problem motivated by biological applications. Here 
a set of n items contains a subset of d ^ n defective items. 
Subsets (pools) of the n items are tested - the test outcomes are 
negative, positive, or stochastic (negative or positive with certain 
probabilities that might depend on the number of defectives 
being tested in the pool), depending on whether the number of 
defective items in the pool being tested are fewer than the lower 
threshold /, greater than the upper threshold u, or in between. 
The goal of a stochastic threshold group testing scheme is to 
identify the set of d defective items via a "small" number of such 
tests. In the regime that / = o{d) we present schemes that are 
computationally feasible to design and implement, and require 
near-optimal number of tests (significantly improving on existing 
schemes). Our schemes are robust to a variety of models for 
probabilistic threshold group testing. 

I. Introduction 

Classical Group Testing: The set A/" of n items contains 
a set V of d "defectives" - here d is assumed to be o{n). 
The classical version of the group-testing problem was first 
considered by Dorfman in 1943 |T| as a means of identifying 
a small number of diseased individuals from a large population 
via as few "pooled tests" as possible. In this scenario, blood 
from a subset of individuals is pooled together and tested - 
if none of the individuals being tested in a pool have the 
disease the test outcome is "negative", else it is "positive". In 
the non-adaptive group testing problem, each test is designed 
independently of the outcome of any other test, whereas for 
adaptive group-testing problems, the testing procedure may be 
conducted sequentially. For both problems, 0{d\og{n)) tests 
are known to be necessary and sufficient - a good survey 
of some of the algorithms and bounds can be found in the 
books by Du and Hwang (21, and the paper by Chen and 
Hwang |4|. 

Threshold Group Testing: In this work we focus on a 
generalization of the classical group testing problem called 
threshold group testing, first considered by Damaschke 0. 
The difference is that the outcome of each pooled test is 
"positive" if the number of defectives in the test is no smaller 
than the upper threshold (denoted u), is "negative" if no larger 
than the lower threshold (denoted /) defectives were contained 
in the test, and otherwise it is arbitrary ("worst-case"). Clearly, 
when u = 1 and / = 0, this reduces to the classical group 
testing problem. There are other generalizations of classical 
group testing iQ, Q, IH. Applications of the threshold group 
testing model include the problem of reconstructing a hidden 
hypergraph (91 , ifTOl , ifTTIl . |[T2ll . and a searching problem 
called "guessing secrets" 0, |[T3l . 



The first adaptive algorithm for threshold group testing was 
proposed in |5 1. When the gap g (defined as ^x — / — 1, the dif- 
ference between the upper and lower thresholds) equals 0, the 
number of tests in for identification of the set of defectives 
is O [[d-\-u'^) logn). When the gap g ^ 0, the number of 
tests required by |5| scales as 0{dn^-\-d'^), if g-\-{u—l)/b mis- 
classifications are allowed (here 6 > is an arbitrary constant), 
with polynomial-time decoding complexity. The work of (T2\ 
showed that O (e(i^+^ \og{n/d)) non-adaptive threshold tests 
suffice to identify the set of defectives with up to g misclas- 
sifications and e erroneous tests allowed. The computational 
complexity of decoding is O (n^ logn) for fixed ((i, e). In |9|, 
instead of the strongly disjunct matrices used in |[T2l . a proba- 
bilistic construction of a weaker version of disjunct matrices is 
used to reduce the number of tests from O ((i^+^ \og{n/d)) to 
O ((i^+^ logdlog{n/d)). Also, two explicit constructions with 
number of tests equaling (9 ((i^+^ (logd) quasipoly (logn)) 
and (9 (d^+^+^poly (logn)) (for arbitrary (3 > 0) are pro- 
posed. However, the computational complexity of decoding 
is not addressed. Also, |14| draws a connection between 
"threshold codes", non-adaptive threshold group testing, and 
a model called "majority group testing". 

(A) Worst-case Model: If the number of defective items in a 
pool is between the upper and lower thresholds ("in the gap"), 
then the test outcome is assumed to be arbitrary. Algorithms 
must therefore be designed to account for a malicious ad- 
versary that can set test outcomes to maximally confuse the 
threshold group testing scheme. 

(B) Zero-error (with misclassifications): The algorithm is 
required to guarantee (with probability 1), that the output 
is "correct" (it contains the set of defective items, up to a 
certain number of misclassifications). (A fundamental conse- 
quence of these two models assumptions is that if the gap 
g = u — I — 1 > 0, the set of defectives cannot be exactly 
identified - regardless of what algorithm is used, one can only 
reconstruct the set of defective items up to a certain number 
of misclassifications 0). 

Stochastic Threshold Group Testing: We relax these aspects 
of the conventional setting. In particular, we relax the worst- 
case model to a stochastic model. We seek probabilistic 
guarantees instead of the absolute zero error guarantees. 
(A) Stochastic Model: This setup is motivated by a class 
of biological applications |15| where the test outcomes are 
observed to be random whenever the number of defectives in 
a pool falls within a given range. We consider two models. For 
the first model, we assume that the outcome of a test is equally 
likely to be positive or negative whenever the number of 
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Fig. 1. Bernoulli gap stochasticity: 

If the number of defectives present 
in a test is between the thresholds, 
the probability that the outcome is 
positive equals 1/2. 
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Fig. 2. Linear gap stochasticity: 

If the number of defectives present 
in a test is between the thresholds, 
the probability that the outcome is 
positive increases linearly. 



defectives in a pool is in the range (/, ix). In our second model, 
the probability of a test outcome being positive depends on the 
number of defective items in the test, and for concreteness, we 
assume that this dependence scales linearly from / to u (though 
our results hold for more general models as weljj). These two 
models are represented in Figures [T] and [2] 
(B) Probabilistic Guarantee: We allow for a "small" proba- 
bility of error for our algorithm, where this probability is both 
with respect to the randomness of the measurements within 
the gap, and the test design. 

These "natural" information-theoretic relaxations in the 
model result in schemes that have significantly improved per- 
formance, compared to prior work. In particular, our schemes 
require far fewer tests than prior algorithms, and also admit 
computationally efficient decoding schemes. They also directly 
lend themselves to scenarios with zero gaps, and also to other 
models similar to group-testing, such as the Semi-Quantitative 
Group Testing |8|. 

For the stochastic threshold group-testing problem we 
present three algorithms (TGT-BERN-NONA, TGT-BERN- 
ADA, and TGT-LIN-NONA, respectively for the non- 
adaptive problem with Bernoulli gap stochasticity, adaptive 
problem with Bernoulli gap stochasticity, and non-adaptive 
problem with linear gap stochasticity). Our results are sum- 
marized as follows. 

Theorem 1: (Non-adaptive algorithm with Bernoulli gap 
model) For / = o(d), TGT-BERN-NONA with error prob- 
ability at most e requires (4e^ ln(2)/7r^) ln(l/e)>//(iln(n) + 
0{\n.(l/e)d\/l) tests and computational complexity of decod- 
ing 0{n\ii{n) +nln(l/e)). 

Theorem 2: (Two-stage Adaptive algorithm) For / = 
o{d), TGT-BERN-ADA with error probability at most e 
requires 16e^(iln(n) + 0{hi(l/e)d) tests and computational 
complexity of decoding 0{nhi{n) +nln(l/e)). 

Theorem 3: (Non-adaptive algorithm with linear gap 
model) TGT-LIN-NONA with error probability at most e 
requires 0{g'^d\n{n)) -\-0{\n{l/e)d) tests and computational 
complexity of decoding 0{g'^n\n{n) + nln(l/e)). 

Remark: Note that the number of tests required by our 
algorithms are, in general, much smaller than those required 

^In fact, as long as there is a statistical difference between the probability of 
a positive test outcome when the number of defective items is within the range 
(l,u), and outside this range, our approach works. Due to space limitations, 
in this work we focus on the two models in Figures [T] and [2] 
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The set of all items. 

The total number of items, n = | A/"! . 

The unknown subset of defective items. 

The total number of defective items, d = \V\. 

The binary indicator variable corresponding to the j-th item. 

The lower threshold. 

The upper threshold. 

The gap (g ^ u — I — 1) between the two thresholds. I and u. 

The total number of tests. 



Algorithmic parameters 
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The p-th division of AA into separate regions. 

The complement of Vp that contains reference groups. 

The total number of divisions of V. 

The r-th reference group in Pp. 

The total number of reference groups in each Vp. 

The set of indicator groups in the i-th family /partition of A/" 

The k-th indicator group in the i-th family. 

The total number of families for indicator groups. 

A randomly picked indicator group from Xi . 

The subset of indicator groups from (J^ li that includes Xj . 



The test outcome when the group IZp 



. UXJ'^^ is measured. 



TABLE I. Notation used frequently in this paper. We use calligraphic notation 
to denote sets, and boldface calligraphic notation to denote sets of sets. 



by prior works - this demonstrates the power of using the 
stochasticity that may naturally be inherent in the measurement 
model. 

II. Intuition 

To build intuition into our proof techniques consider the 
Bernoulli Stochastic Model described in Sec. 1 and Fig. [T] 
We note the discrete transition in terms of distribution of test 
outcomes for pools consisting of / + 1 defectives relative to 
those that contain / defectives. If negative outcomes are labeled 
zero and positive outcomes labeled one, the test outcomes 
are identically zero for pools containing exactly / defectives. 
So the distribution of test outcomes is concentrated at zero. 
For pools containing / + 1 defectives the distribution of test 
outcomes is split equally at zero and one. We can exploit this 
aspect of the model in the following way. Suppose we had a 
pool, 7^* consisting of exactly / defectives then one could test 
whether or not an item, Xj 7^* is defective by augmenting 
7^* with Xj and testing the new pool 7^* Ui^j}- 

To exploit this idea we have to account for several issues. 
First, we do not really have a candidate pool 7^*. Second, 
with this naive strategy, the number of tests would grow with 
the number of items even when we have a candidate pool TZ* 
consisting of / defectives. 

To address these requirements we construct two distinctive 
collections of pools based on random designs. The reference 
group collection, IZ, is a collection of R pools, each with nl/d 
items, such that at least one among the R pools has exactly 
/ defective items in it. The idea is that with high probability 
one among the R pools contains the critical candidate 7^*. 
The second collection, the transversal design, is a family 
X of sub-collections of size /. Each sub-collection within 
this family consists of 0{d) disjoint pools indexed as Xi^k- 




Fig. 3. An example illustrating TGT-BERN-NONA's testing/encoding 
scheme: (1) The population M of n items contains a small subset T> of 
d defective items (the shaded region). (2) Af is partitioned equally into Vi, 
V2 and 7^3 (in this case P = 3). (3) From each Vp (the complement of 
Vp), R reference groups of size nl/d (in this example, R also equals 3) 
are picked uniformly at random. (4) Independently of all prior choices, / 
families of indicator groups Xi are chosen. In each family, J\f is partitioned 
uniformly at random into indicator groups of size ^2n/{d — l) each (here 72 
is a code-design parameter whose value is specified later). (5) The complete 
bipartite graph shows the "cross-product" design of threshold tests of the form 
7?.xX, in which left (blue) nodes represent reference groups, right (red) nodes 
represent indicator groups, and edges represent threshold tests corresponding 
to the union of items denoted by the connected nodes (one reference group 
with one indicator group). 




Fig. 4. An example illustrating TGT-BERN-NONA's scheme to estimate 
whether a particular reference group is a critical reference group or not. 
The top nodes are a set of randomly picked indicator groups from different 
families. The bottom nodes representreference groups picked uniformly at 
random in a particular division (say Vi). Each bottom node connects to all 
the top nodes by an edge whose type indicates the test outcome when the union 
of items denoted by its connected nodes are measured. A solid edge indicates 
a positive outcome; conversely, a dashed edge indicates a negative outcome. 
In this example, 7^i,i has less than / defective items, and is unlikely to hit too 
many indicator groups with enough defective items such that their union has 
more than I defective items (in this example, only 2 out of 7 test outcomes are 
positive). At the other extreme, 7^i,3 has more than I defective items, and has 
a "fairly high" probability to give a positive outcome (in this example, 6 out 
of 7 test outcomes are positive). In between these two extremes, it is expected 
that a "critical" reference group (with exactly I defective items) would produce 
positive test outcomes with an intermediate ratio (in this example, for 7^1,3, 
4 out of 7 test outcomes are positive, and say the expected empirical number 
of positive test outcomes for a critical reference group is in the range 3 to 5). 
Hence, the decoder declares that (only) 7^i,2 is critical. 




(critical) IZi 2 



Fig. 5. An example illustrating TGT-BERN-NONA's decoding scheme 
to determine whether particular items are defective or not. The sets of top 
nodes denote the sets of indicator groups that include a particular item Xj - 
in this example, we consider only the indicator groups corresponding to xi 
and X2. The bottom node represent a critical reference group. An edge has 
the same definition as in Figure |4| i.e., a threshold test corresponding to the 
union of the items in the indicator group and the critical reference group. 
A test comprising of an indicator group corresponding to a defective item 
Xj with a critical reference group will always have more than I defective 
items. Hence such tests are expected to give a higher ratio of positive test 
outcomes, than if the item Xj is non-defective (since at least some of the 
corresponding indicator drops will not contain any defective items - if one 
chooses the side of the indicator groups carefully (about 0(n/d)), this will 
happen in a constant fraction of indicator groups. In this example, the decoder 
declares the 1st item to be non-defective (since "few" of the test outcomes are 
positive), and the 2nd item to be defective (since "many" of the test outcomes 
are positive). 



Each disjoint pool within a sub-collection is referred to as an 
indicator group. Consequently, each item appears only once 
within any sub-collection and / times within the entire family. 
The indicator group collection serves the role of an item {xj } 
described in the preceding paragraphs. 

Our algorithm is based on augmenting each indicator group 
within a sub-collection with the reference group collections to 
form pools that are then tested. The idea of transversal design 
is not new and has been used before in conventional group 
testing fT6l as well. The novelty here is the cross-product, 
namely, testing indicator groups against a reference group 
collection resulting in 7^ x X pools. The question arises as to 
how to construct these collections and how to find defectives 
given the test outcomes. 

To construct 7^ we begin by noting that if one chooses a 
random group of size nl/d, with probability about l/>// it 
has exactly / defective items in it. This is because of the fact 
that the expected number of defective items in such a group 
is exactly /, and standard analysis using Stirling's approxima- 
tion of the hypergeometric distribution corresponding to the 
number of defective items in a group of size nl/d implies 
that the probability of hitting this expectation scales as 1/^fl. 
This means that if one chooses about R = 0{\^) "candidate 
reference groups" 1Zr, each of size nl/d, then "with high 
probability" at least one group 7^* will be "critical" (have 
exactly / defective items in it). To summarize, we select 0{^A) 
candidate reference groups of size nl/d each, and 0{dlog{n)) 
indicator groups of size about n/d each. We then perform 
threshold group-tests on every pair of the form IZr U X, for a 
total of about 0{\^d\og{n)) (non-adaptive) tests. 

Our decoding algorithm hinges on identifying the critical 
candidate(s) 7^* . To do so we make use of the statistical differ- 
ence between reference groups that are critical, and those that 
are not. When IZ and a randomly-picked indicator group are 



tested together, the probabiHty of observing a positive outcome 
is an increasing function of the number of defective items in 
IZ. Hence the decoder performs matching and quantization, as 
follows. For a large set of randomly-picked indicator groups X, 
each indicator group is tested with IZ. The decoder computes 
the empirical fraction of tests with positive outcomes, and 
estimates whether IZ is critical or not by comparing this 
empirical fraction with a pre-computed expected fraction for 
a critical reference group. By the Chernoff bound, one can 
concentrate the variation of the empirical concentration around 
the expectation quite tightly, and hence, with high probability, 
estimate whether or not a given reference group is critical. 

Another challenge remains - namely, how does one identify 
the defective items within 7^*? The solution proposed is to 
divide J\f into P disjoint divisions Vp, and sample reference 
groups from every Pp, the complement of Vp. If the decoder 
discovers a critical reference group 7^* from Vp, all items in 
A/" \ TVp (5 Vp) are decodable. Furthermore, if the decoder 
discovers a critical reference group 7^* from every Vp, all 
items in J\f (including those in any critical reference group 
7^p are decodable since M ={jVp <^{j{M\ni) ^M. 

If one is allowed to perform adaptive threshold group tests 
(even just two stages), then one can significantly reduce the 
number of tests required. The idea is to use the first stage 
to identify critical reference groups (have exactly / defectives) 
and then use only those reference groups in subsequent stages. 
Hence one reduces the overall number of tests required by 
a factor of y/l, since one no longer needs to test all cross- 
products between all reference groups and all indicator groups. 

In the case that the probability of positive test outcomes 
scales linearly in the gap (as in Figure [2]), we note two 
conflicting factors at play. On the one hand, suppose a test 
contains v defective items, now there is a large range of 
values {v can be all the way from / to i^ — 1) for which the 
probability of positive test outcomes differs between tests with 
V defectives and tests with v + 1 defectives. Hence any such 
group can be used as a proxy for a critically thresholded group 
(instead of demanding that such critical groups contain exactly 
I defective items). If the gap g is "reasonably large", then in 
fact choosing a reference group of size n{u ^ I)/ (2d) results 
in a group falling within this range with high probability. 
Hence one does not need 0{y/l) reference groups to find 
good ones - a constant number suffice. On the other hand, the 
statistical difference in the empirical probability of observing 
positive test outcomes now only changes very slightly, if a 
group contains v defective items, and if it contains v + 1 
defective items. This difference in fact scales as 1/g. To be 
able to reliably detect such a slight change, one has to perform 
about a factor g'^ more tests. Hence in this linear gap model of 
stochastic threshold group testing, the number of tests required 
differs from the Brenoulli gap model by a factor of g'^ / Vi- 
lli. Algorithm for Theorem[T] 

We now formally describe the TGT-BERN-NONA algo- 
rithm that meets the conditions of Theorem [T] 
Encoder/Testing scheme: 



1) Let 7i = (d + I)/ {2d) and P = 1/(1 - 71). Partition 
J\f into P disjoint sets {Vp : p = 1, 2, • • • , P}, each of 
equal size n/P. Denote the complement of Vp by Pp. 
Note that every Pp is of size 71 n. The encoder generates 
r-th reference group TZp^r by randomly picking nl/d dis- 
tinct items from Pp. This process is repeated R times for 
every Pp so that the encoder obtains a set of reference 
groups n = {IZp^r : p = 1, 2, • • • , P; r = 1, 2, • • • , i?}. 
Let 72 G (0, 1]. For each family i G {1, 2, • • • , /}, the 
encoder generates a random partition {X^} = {Xi^k • 
k = 1, 2, • • • ^d — 1} of A/", where each X^ ^ is of size 
72n/((i— /), and we call it an indicator group. X = [jXi 
represents the set of indicator groups. 
For every pair of TZp^r and X^ fc, the encoder performs a 



2) 



3) 



threshold test on 1Zp r U X, 



i,k- 



Decoder: 

1) For each i = 1, 2, •••,/, let X!>''^ be a randomly 
picked indicator group from {X^}, and let X^^^ = 



r(0) 
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be the test outcome 
and Oy = 
The decoder declares 
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where A<^ = {Oi - 0i-i)/{20i) and A>^ = (^,+1 - 
0i)/(20i). That is, the decoder declares a reference group 
to be critical if the empirically observed fraction of 
positive test outcomes involving that reference group "is 
close to" the "expected value" (Oi). The probability of 
this event is calculated in Lemma |6l 
2) For each i = 1, 2, •••,/, let X^-^^ be a indicator 
group from {X^} that includes Xj, and let X^^-' = 



{^ 



<i) 



= 1,2,--- ,/}. Let y 
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p,r,i 



be the test out- 
is measured, and 



come when the group 1Zp^r U Xj" 

ay, = Pr (^^'^ = 1 I lUp^r nV\=l, Xj = w^ . Note 



^ p,r,z ^ I I ' P'' 

that Xj is a binary indicator variable taking values 1 or 
depending on whether the item is defective or not. For 



every critical 1Zp^r ^ Vp and Xj G Vp, if ^^ y^JJ,^^ 



< 



|X^^'^|ao(l + A) where A = (ai - ao)/(2ao), the 
decoder declares Xj to be non-defective, else declares 
it to be defective. That is, the decoder declares an item 
to be non-defective if the empirically observed fraction 
of positive test outcomes involving that item "is close 
to" the "expected value" (ao). The probability of this 
event is calculated in Lemma U\ 

IV. Proof of Theorem [T] 

Definition 1: Hypergeometric distribution describes the 
probability of picking v defective items when we pick s 
distinct items from n items with d defectives. The probability 
mass function is given by 

aw 
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Lemma 4: The probability of picking / defective items 
when we sample {nl)/d items from n items with d defective 

items is l^(^y^y^). 

Probability of picking / defective items 



y/l\d-l\n-d 




Size of Sampling 



Fig. 6. Probability of sampling I defective items as a function of the size of 
the group undergoing a threshold test. 

Proof: The probability that a test of a certain size has 
exactly I defective items scales according to the hypergeomet- 
ric distribution given in Definition [T] When the number of 
items in the test equals nl/d, this probability can be shown 
via Stirling's approximation |17| that for all n G N+, 1 < 
n!(27rn)~-^/^(n/e)~^ < e(27r)~-^/^, and therefore implying 
for all ken+,k<n, ./2i/e^ < (^) ^M^^lilizMn < 
e/(27r) to scale as 



(d\ ( n-d 
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V 47r^ 1 

~ "e^T/ V d-l 
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n — d 



(1) 



Note that the exponential terms from the Stirling's approxi- 
mation of the binomial coefficients are exactly cancelled out 
in ([T]). ■ 

Lemma 5: With probability at least 1 — 62, for each p G 
{!,••• , P}, every Pp has at least one critical reference group 
when 



R> \n 



£2 



In 



2d 



Att^ \ d^l 



d — l n — d 



Proof: Let Ei (p) be the event that a specific Pp has "too 
many" defective items, i.e., ||Pp fl P|| > (1 ± (5) 71 d for any 
particular p. Let Ei be the event corresponding to the union 
of Ei(p), i.e., that at least one division has too many defective 
items. Let E2 be the event that there exists a Pp which contains 
no critical reference group. First, we compute the union bound 
of probability of Ei for all p as 



PPr(Ei) < 



exp 



< 



Inequality ^ 
that 71 = 



4d 

follows from 

[d + l)/{2d). 



2(n + 2)(((57id)^-l) 
(n-d+l)(d+l) 



exp(-((57i)'^) 
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= 0(1/75), 



Note 
than 
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V((4d)/(d + 02)ln((4d)/((d-0ei)) ^ 
bounded from above by a constant ei. 

Let Vvi{v) = Pr(|7^p,r nV\=l\\PpnV\= v), i.e. the 
probability that we pick a critical reference from Pp given Pp 
contains v defective items. Let Pr(v) = Pr(|Pp nP| = v), i.e. 



the probability that we pick a critical reference from Pp. We 
wish to compute Pr/((1 + S)'yid) and Pr^((l — S)'yid). The 
ratio of Fvi{jid) to Pr^((l + S)'yid) can be computed as 



Pr^Ti^) 



Prz((l + (^)7i^) 



{r)( 



7in-7i(i\ 
-I J 
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/(l+(5)7idW7in-(l+5)7id' 




(4) 



-An-d) 



(n-d)(7i-^)-i 



-An — d) 



(n-d) (71 
lSji{n — d) 



2)-lid5^ 



jid6 



(5) 



{n-d){-fi-^)--fid6j 

Equality ^ follows by noting that Pr/(v) follows hypergeo- 
metric distribution. Inequality (jsl follows from (1 + x)^ < e^^ 
for \x\ < 1 and ?/ > 0. For S = o{l/Vd) and / = o{d), ^ 
is bounded from above by e. The same technique applied to 
the ratio of Pr/(7i(i) to Pr^((l — S)'yid) implies that it is less 
than e. These bounds on these ratios, together with Lemma |4] 
gives us that 



Fvi{{l±6)jid) 
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PrH 
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5hid) 


Prii^id) 




Pniiid) 
1 Id + l 
^n d-l 


47r^ 
e6 
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Finally, we bound the probability of E2 (note that E2 is 
defined in the beginning of the proof) occurring as 

d 

Fr{E2) < P^Pr{v) {1 -Fri{v)f 

v=0 

(l+5)7irf 

< P Pr(Ei) + P Y^ Fr{v) (1 - Pr^(v))^ 

^'=(l — 5)7id 



< P Pr(Ei) + P (1 - Pr,((l ± (5)71 d))' 



(7) 



We now substitute equations ^ and ^ into ^. Note that 
for large enough d, the constant ei (bounded from above by 
the quantity in ^ can be made arbitrarily small. Hence, if 
we bound ^ from above by constant 62, for large enough d 
and P, Q can be made smaller than any 62, and we obtain 
Lemma \5\ ■ 

Lemma 6: With probability at least 1 — 63, the decoder 
correctly determines whether a given 1Zp^r is critical or not 
when 

I>8e^ Mn(PP)+ln(- 

Proof: We have three type of reference groups. We call a 
reference group promising if it contains at most / — 1 defective 
items, and call it misleading if it contains at least /+1 defective 



items. Finally, recall that a reference group is critical if it 
contains exactly / defective items. The error events include 
four kinds of misclassification. We denote the probability of 
misclassifying a promising IZp^r to be critical by Pr^^^^, the 
probability of misclassifying a critical IZp^r to be promising 
by Pr^^P, the probability of misclassifying a misleading IZp^r 
to be critical by Pr^^^, and the probability of misclassifying 
a critical IZp^r to be misleading by Pr^^"^. Each of the error 
probabilities can be bounded by a binomial distribution. The 
first error event can be computed as 



Pj^p^c ^ Pj, 



\p,r:\np^r-rW\<l i 
I 



^<l) 



<RP Y. 

<RP Qxp{-2I{0i{l 



n-i 



(1 



^i-i) 



\i-t 



^<i) 



yi-i 



n- 



(9) 



In inequality ([8]), we take union bound over all 7^p,r, and 
the summation is over the tail of a binomial distribution 
(corresponding to the event that a promising reference group 
"behaves like" a critical reference group). Inequality ^ fol- 
lows from the Chernoff bound. Similarly, for the other error 
events we have that 

Pr^P < RP exp {-2I{eiA^if) , (10) 

Pr^^^ < RP exp (-2/(^^+1 - ^^1 + A>0)') , (11) 

Pr^" < RP exp (-2/(^,A>0') . (12) 

Within the valid range of A<^, Pr^^^ is strictly increasing 
as a function of A</; conversely, Pr^^^ is strictly increasing 
as a function of A</. A<^ is one that allows for a "small" 
choice of /, while still keeping both Pr^^^ and Pr^^^ "small". 
The same argument holds for A>^ to Pr^^"^ and Pr^^"^. Some 
specific choices of A<^ and A>^ that work, and that we use, 
are 






0i)/{'20i), 



(13) 
(14) 



^ balances ([T0|), and ([TT]) balances ( [T2| ). 



so that ^ ^ ^ ^ 

Let Pr^l^^ = Pr {\{lf^ \ IZp^r) f^V\=w\ {Up^r nV\=v^ 
(This is the probability that, conditioned on the reference 
group TZp^r containing v items, a randomly chosen indicator 
set from the i-th family X} ^ contains exactly w defective 
items that are not contained in the reference group IZp^r)- 

Hence the conditional probabilities of giving a positive 
outcome can be expanded as 



Oi- 



n+i 




(15) 



(16) 



(17) 



Since the summations in each equation (p?])-(p7|) are "close" 
to each other, we ignore them in the following calculations 
(since only their pairwise differences required, and the sum- 
mations only contribute lower-order terms). For example, the 
difference between Pr^j^ and PrL^_i can be computed as 



Pr 



(0) 

v\l 



Pr(°) , = Pr' 

v\l — l 




(18) 



When V = o{d), ( 18 ) is bounded from above hy v/{d — l^v), 
which is asymptotically negligible as d grows without bound. 



The quantity 20iAyi in (12) is then bounded by using ( 14), 



(8) ( 16 ) and ( 17 ), and noting that 



4^,A>, = 2(^^+1 - ^0 
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(19) 



Inequality (19 ) follows from the fact that (1 + x) ^ > e^^ for 



|x| < 1 and y > 0. 

The quantity 26>/A<^ in (10) can be bounded in a similar 
manner as 

4^,A<, =2(^,-^,_i) 



>Pr 
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(20) 



Finally we substitute the results from (p3])-([2Q|) into ([9|)-(p^. 
The requirement that error probability of misclassification of 
any reference groups be at most 63 implies 

/ > 8max ((P^;^)"' , (PrliLi)'') (hRP) + In (^ 



For d = o{n) and "large enough" n, (19) "behaves" as e^^, 
and (20) "behaves" as 72 e"^^ xhe quantity / is minimized 



when 72 is 1 . Therefore we obtain the result in Lemma [6] 



Lemma 7: With error probability at most 64, the decoder 
correctly determines whether an item Xj is defective or non- 
defective when 



/ > 8eM ln(n) + In 



Proof: As the rule for deciding whether an item is 
defective is not, and also the rule for deciding whether a 
reference group is critical or not, both depend on matching 
empirically observed test outcome statistics with precomputed 
thresholds, the proof here is essentially the same as in the 
proof of Lemma |6] We outline the major changes below. 

The error event includes both false positives (misclassify- 
ing non-defective items to be defective) and false negatives 
(misclassifying defective items to be non-defective). The prob- 
ability of false negatives can be computed as 






< n exp (-2/(ai - ao(l - A))^) . 



(21) 



In a similar manner, the probability of false positives can be 
computed as 



ij:xj=0 i 



<n Yl (j4(l-«o) 

t=Iaoil-A) ^ ^ 

< n exp (-2/(aoA)^) . 



i-t 



ill) 



Let Pr^], = Pr(lXp') ^V\ = v \ x, = 0) and Pr^] = 
Pr(|Xp^ nV\=v\xj = 1). A good choice of A is 

A = (ai-ao)/(2ao). (23) 

We may expand ao and ai in terms of Pr^"]^^ and Pr^^ as 

V v=g / 

\ v=g / 



The difference between ao and ai can be computed as 

2(ai - ao) > Pr^], (24) 



n {^-^. 



> 1 



> exp 



d-l 



^-ri + l 



72n - d^l 



(25) 



Equality ( |24| ) ignores the small difference came from the 
summation terms. Inequality ( [25] ) follows from the fact that 
(1 + x)-y > e-'^y for \x\ < 1 and ?/ > 0. 



Finally we substitute ([23]) and ([25]) into ([21]) and ([22]), and 
72 is set to be 1 in Lemma [6] The requirement that the error 
probability of misclassification of any item be at most 64 
implies the result in Lemma [7] ■ 

Proof of Theorem [If A sufficient condition for high prob- 
ability decoding all items is when Lemmas [5][7] are satisfied. 
Therefore, with error probability at most 62 + 63 + 64, the total 
number of tests T is RP{d - 1)1, where P = {2d)/{d - I), 
R is specified in Lemma [5] and / required to satisfy both 
Lemma [6] and Lemma [7] is set as 



I> 



ln(n) 



ln( — 

.^3 



ln( - 

.^4 



Explicitly, T is at least (4e^ ln(2)/7r2) ln(l/62)V^^ln(n) + 
0(ln(l/6364)(i>//). As to the computational complexity of 
decoding, recall that the first decoding step decodes a ref- 
erence group by counting the empirical fraction of positive 
outcomes from / indicator groups, and the second decoding 
step decodes an item by doing the same thing. Therefore, given 
there are PR reference groups and n items, the complexity 
is I{n + PR), which is 0(nln(n)) + 0(ln(l/62) ln(n)) + 
(9(nln(l/6263)). Let e = max(62, 63, 64), we obtain Theo- 
rem [T] ■ 

V. Proof sketches of Theorems[2]and[3] 

Slight modifications of TGT-BERN-NONA can result in an 
adaptive algorithm (as in Theorem [2]), and also an algorithm 
for threshold group testing models where the probability of 
giving a positive outcome is a monotonically nondecreasing 
function of the number of defective items being measured. 
As a demonstration, we show a two-stage adaptive algorithm 
TGT-BERN-ADA, and a non-adaptive algorithm TGT-LIN- 
NONA that works under the "linear model" of stochastic 
threshold group testing (as in Figure [2]). 

A. TGT-BERN-ADA 

In the first stage, we aim to find multiple critical reference 
groups. This is done by first performing the encoding step 1 
of TGT-BERN-NONA, and then obtaining a set of indicator 
groups called X^^\ The construction of X^^^ is however 



slightly different from the definition given in TGT-BERN- 
NONA. Here X^°^ = {xf^ : z = 1,2,--- ,/i}, where each 
X\ ^ is a group of 72n/(i distinct items randomly picked 
from J\f. For every pair of a reference group from 7^ and an 
indicator group from X^^^ , the two groups are pooled together 
and a threshold group test is performed. As to inference of 
whether particular reference groups are critical or not, this 
follows the decoding step 1 of TGT-BERN-NONA, but using 
the definition/parameters of X^^^ provided in this paragraph. 
The second stage follows steps 2 and 3 of TGT-BERN- 
NONA. However, only the set of reference groups decoded 
to be critical in the first stage are tested in step 3, hence 
the multiplicative factor of y/l is missing from the overall 
number of tests. To avoid confusion with TGT-BERN-NONA, 
notation for the total number of families in the set of indicator 
groups (denoted / in TGT-BERN-NONA) is replaced by h. 
Finally, to decode whether individual items are defective or 
not, TGT-BERN-ADA uses decoding step 2 in TGT-BERN- 
NONA. 

Proof sketch of Theorem |2| A sufficient condition for high 
probability decoding all items is when R satisfies Lemma |5] 
/i satisfies Lemma |6j and I2 satisfies Lemma [T] There- 
fore, with error probability at most 62 + 63 + 64, the total 
number of tests T is RPIi + P{d — l)l2, which is at 
least 16e'^dln{n) + (9(ln(l/e2)V^ln(0) + 0{ln{l/es)Vi) + 
0{\n{l/e4)d). The computational complexity of decoding is 
nh + PRh, which is 0{n\n{n)) + (9(ln(l/e2)vTln(/)) + 
0{ln{l/es)^A) + C>(ln(l/e4)n). Setting e = max(e2, 63, 64), 
we obtain Theorem [2l 
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B. TGT-LIN-NONA 

The testing scheme follows that of TGT-BERN-NONA ex- 
actly. The decoding scheme is based on TGT-BERN-NONA, 
but has the following changes: 

1) Estimation of number of defectives in a reference group: 
We first estimate the number of defective items 
in a single reference group, according to the 
empirical probability the reference group resulting 
in positive test outcomes. More precisely, let 
Oy = Pr (yfl- = 1 I lUp^r nV\=v] be the expected 
fraction of positive test outcomes. The decoder declares 
IZp^r contains v defective items if 

|X(°)|^„(l- A<„) < J2yf}^, < |X(°)|e„(l + A>„), 



where the "variation" A<^ around the expectation is 
set to equal {Oy — 0y-i)/{20y) and A>^ = (6>^+i — 

0v)/{2ey). 

2) Estimation of defectiveness of items: In this case, the 
threshold for estimating that an item is defective 
is different than in TGT-BERN-NONA, since the 
variation between the empirical probability of ob- 
serving positive testing outcomes in tests includ- 
ing the reference group is different. Specifically, let 



every TZp^r ^ T^p which contains v 



items and Xj G Vn 



if y.y^^^ ■ 

L^i y p,r,i 



< 



e {l^u) defective 



where A^ = {ay^i—ay^o)/{2ay^o), the decoder declares 
Xj to be non-defective, else declares it to be defective. 

Proof sketch of Theorem |3l We note that Lemma \5\ is not 
required, since any reference groups 1Z can be used to decode 
items, as long as the number of defective items in IZ is 
between / and u, so that there is some statistical difference 
between the probability of a positive test outcome if a group 
has i defective items, or if it has i -\- 1 defective items). With 
high probability, a reference group with a suitably chosen 
size (n{u + I)/ 2d) satisfies this relaxed condition. That is, 
the number of reference groups R required is a constant. 

A sufficient condition of high probability decoding of all 
items is when modified versions of Lemmas [6] and [7] are 
satisfied. The modifications in Lemma [6] and [7] correspond to 
the fact that the required number of indicator groups increase 
by a factor of g'^. This is because the decoder is based 
on estimating the probability difference of giving a positive 
outcome when the test has an additional defective item. Hence 
the difference between two different probabilities scales as 
1/g. By the Chemoff bound, to estimate such a probability 
difference sufficiently accurately requires a multiplicative fac- 
tor of g'^ in the number of tests. 

The rest of the argument is as in the proof of Theorem [T] 
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